feature: improve provenance and make q2-preview editable by gordonwoodhull · Pull Request #231 · quarto-dev/q2

gordonwoodhull · 2026-05-22T00:00:25Z

Draft PR for CI.

Provenance epic plans 3-7 are complete; provenance data is flowing and impossible edits to atomic elements are both blocked on the front end and soft-dropped by the incremental writer.

Next up:

Plans 7a-7c: more testing and a soft drop warning for attempting to edit before first render
Plan 8: use a custom node for include shortcode
Plan 9: expose provenance of YAML values used in transforms/meta shortcode
Plan 10: consistent provenance of Lua-produced content, so that you could eg cmd-click some content and go to the line of Lua code that produced it.

The hub-client-e2e.yml `paths:` filter only fires the workflow when a commit touches `hub-client/**` or the workflow file itself. It does not follow transitive Rust deps, so PRs that modify upstream crates the WASM bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`, `pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently skip e2e. Two recent misses: - f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and `pollster::block_on` introduced in `quarto-core` broke 8 hub-client WASM tests on main. e2e never ran because the change was under `crates/`, not `hub-client/`. - PR #231 (feature/provenance, this branch): 57 files modified across `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently skipped on every push despite the PR materially changing the WASM bundle's behavior. Fix: drop the `paths:` filter outright and match the trigger shape of the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`). Also adds a `concurrency:` block (lifted from `test-suite.yml`) so superseded runs on a PR get cancelled in flight — keeps the runner cost from compounding. Closes bd-izh3. The original ask there was to add a PR trigger with a *broader* path filter; that approach still wouldn't catch the upstream- crate case, so we go the coarser route the issue's spirit calls for. The runner-sizing open question in bd-izh3 is also resolved — ae8274a confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the full suite in 5.3-8.1 min. `kyoto` deliberately omitted from the branch list: `origin/kyoto` last moved 2026-02-02 and is 825 commits behind main; the sibling workflows still reference it but that's cargo-cult.

…nce) bd-izh3 closed by 016894a on feature/provenance (PR #231). The patch drops the hub-client-e2e.yml path filter outright so the workflow fires on every PR like the sibling heavy workflows — strictly broader than the original 'add PR trigger with broader filter' proposal, since path filters can never follow transitive Rust deps. Incidental: bd-cxara has its 'source_repo_path' field stripped (was a stale absolute path from shikokuchuo's local clone; harmless flush).

The hub-client-e2e.yml `paths:` filter only fires the workflow when a commit touches `hub-client/**` or the workflow file itself. It does not follow transitive Rust deps, so PRs that modify upstream crates the WASM bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`, `pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently skip e2e. Two recent misses: - f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and `pollster::block_on` introduced in `quarto-core` broke 8 hub-client WASM tests on main. e2e never ran because the change was under `crates/`, not `hub-client/`. - PR #231 (feature/provenance, this branch): 57 files modified across `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently skipped on every push despite the PR materially changing the WASM bundle's behavior. Fix: drop the `paths:` filter outright and match the trigger shape of the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`). Also adds a `concurrency:` block (lifted from `test-suite.yml`) so superseded runs on a PR get cancelled in flight — keeps the runner cost from compounding. Closes bd-izh3. The original ask there was to add a PR trigger with a *broader* path filter; that approach still wouldn't catch the upstream- crate case, so we go the coarser route the issue's spirit calls for. The runner-sizing open question in bd-izh3 is also resolved — ae8274a confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the full suite in 5.3-8.1 min. `kyoto` deliberately omitted from the branch list: `origin/kyoto` last moved 2026-02-02 and is 825 commits behind main; the sibling workflows still reference it but that's cargo-cult.

Audit and revise Plans 3-8 of the q2-preview series (now framed internally as the provenance epic) after a design discussion that followed the q2-preview pipeline and attribution work landing on main. Major design changes folded into the plans: - **Plan 4 unified Generated variant.** Collapse the earlier `Synthetic` + `Derived` split into one `Generated { by, anchors: Vec<Anchor> }` shape. Atomicity is per-`by.kind` (orthogonal to anchors); the invocation source byte range is the first anchor with role `AnchorRole::Invocation`. One wire-format code (4) instead of two. - **Plan 4/5/6 typed anchors (Path C).** Instead of stuffing source-info chain metadata into `by.data` (dynamic JSON), the chain is a typed `Vec<Anchor>` where each `Anchor` carries an `Arc<SourceInfo>` and a role-labeled `AnchorRole` (`Invocation`, `ValueSource`, `Other(String)`). `by.data` shrinks to per-kind non-source-info configuration. Two future-anchor roles flagged as follow-ups contingent on metadata-loader and Lua-file-registration work. - **Plan 6 uniform shortcode anchor stamping.** Single funnel covers Rust built-ins, Lua-loaded extension handlers, and user-extension shortcodes uniformly via a post-walk `stamp_shortcode_anchors` helper. Enrichment-via-post-walk preserves Lua-attached `by.data` fields (lua_path, lua_line) while promoting `by.kind` to `shortcode`. Attribution interaction documented: multi-author shortcodes get latest-wins via the existing `query_byte_range` max-time logic composed with chain-walking through the `Invocation` anchor. - **Plan 5 latent code-3 bug now reachable.** Plans 1-2 shipped the q2-preview pipeline that runs filters whose output crosses the JSON boundary; the FilterProvenance code-3 round-trip bug is no longer latent in production. Added end-to-end production-reachability regression test using the `{{< kbd Ctrl+C >}}` fixture (kbd.lua constructs a Span that gets FilterProvenance-tagged and then shortcode-stamped). Drops code 5 from the design. - **Plan 7 SPA edit-back in scope.** The new q2 preview CLI command serves a separate SPA from ts-packages/preview-renderer; both hub-client and the SPA share the writer machinery via @quarto/preview-runtime. Plan 7 now covers replacing `noopSetAst` in the SPA with a real handler that routes through `incrementalWriteQmd` to `syncClient.updateFileContent` and the ephemeral hub's automerge↔disk bridge. Adds a small SPA-local `DiagnosticStrip` for Q-3-42/Q-3-43; hub-client's existing diagnostics-banner handles the same warnings there. Single-file mode (bd-tnm3k) works through the same automerge stack — no special case. - **Plan 8 wrapper stays Original.** Explicit reasoning added for why `CustomNode("IncludeExpansion")` uses Original source_info (CustomNode.type_name carries generator identity; the wrapper substitutes 1:1 for the source-mapped Paragraph). HTML pipeline resolve transform in the Normalization Phase (symmetric with CalloutResolveTransform); HTML doesn't attribute the include line because there's no DOM anchor for it — accepted v1 behavior. Mechanical changes also folded in: - Rename `Synthetic` → `Generated` throughout the type vocabulary in all plans. - Update JS-side hand-mirror file paths (`hub-client/src/utils/...` → `ts-packages/preview-renderer/src/utils/...`) to reflect the Phase-D package split. - Each plan's intro reframed as part of the provenance epic; file names keep the q2-preview-plan-N form for continuity. File renames for clarity about which filters each plan covers: - `…plan-3-filter-idempotence.md` → `…plan-3-builtin-filter-idempotence.md` - `…plan-7a-filter-idempotence.md` → `…plan-7a-user-filter-idempotence.md` Plans 3-8 remain in design state on this branch; no code changes yet.

Audit pass over the provenance epic's idempotence story, scoping Plan 3 to pipeline non-determinism only and propagating the consequences to the neighbouring plans. Plan 3 (builtin transform and filter idempotence): - Retitle to "Built-in transform and filter idempotence verification" — symmetric across Rust transforms and Lua filters (prior framing was too narrow). - Enumerate the actual universe under test: 36 Rust transforms in build_q2_preview_transform_pipeline (4 excluded, named with reasons), ~20 stage-level items in build_q2_preview_pipeline_stages, and the one Lua filter under resources/extensions/ (video-filter.lua). The prior "~10-20 filters" estimate misread shortcodes as filters. - Drop the "Plan 3 strengthening" round-trip amendment that was added alongside Plan 7a in commit 2129d35. Round-trip non-idempotence is not exercised by today's pipeline; CI-time round-trip testing conflates writer-lossiness with filter-non-idempotence; 7a's runtime check is the better home for the property when Plan 7's writer ships. Trim "Two flavors" section to a pointer at 7a. - Add compute_meta_hash_fresh / compute_meta_hash_fresh_excluding_rendered as a new helper in quarto-ast-reconcile, parallel to the existing block hasher. Hash covers blocks + meta (excluding rendered.*). - Rewrite test pseudocode against the real run_pipeline API at pipeline.rs:626. - Add fixture-format constraint: no executable engine cells (CI has no kernels). - Coverage gap audit: ~25 fixtures across the document-level, Lua shortcode, website-project, attribution, and resource categories. Includes lua-shortcode-version, lua-shortcode-lipsum-fixed (non-random path), and video-filter-header for the one built-in Lua filter. - Convert to a development-plan format with a seven-phase work-items checklist. - Close the engine-staleness open question via filter.rs:158 (fresh Lua::new() per invocation). - Clarify the lua-filter-pipeline reference as TypeScript Quarto porting material, not the Rust inventory. Plan 6 (provenance audit): - Add a §Test plan bullet for source_info determinism: Plan 3's hashes exclude source_info by design, so a per-fixture source_info-equality check is Plan 6's own responsibility. Plan 7 (incremental writer): - Add a writer-lossless baseline test as the first §Test plan bullet, prerequisite for the reconciler tests. Reuses Plan 3's fixture set. - Add Plan 3 to §References and §Dependencies (soft-depends-on via compute_meta_hash_fresh). Plan 7a (runtime user-filter idempotence): - Remove all references to the now-deleted "Plan 3 strengthening" section (five locations including a full subsection). - Reframe the out-of-scope bullet from "Strengthening Plan 3" to "Extending the runtime round-trip check to built-in filters," with three-point v1-acceptance reasoning in §Notes. - Update §Design decisions, §Dependencies, and §References to reflect the new shape and the shared compute_meta_hash_fresh helper. - Add the meta-hash comparison to step 4 of the round-trip check. No code changes; design state only.

…ailure policy Hash helper: `merge_op` participates (verified `MergeOp::default() = Concat` is a stable compile-time constant); `Map` entries hashed in insertion order, no sort (an idempotence test should *catch* the kind of HashMap-iteration-order non-determinism a sort would mask). Adds regression-guard unit tests for both choices. Test runner: drives every fixture through both `DriveMode::SingleFile` (direct `run_pipeline`) and `DriveMode::ProjectOrchestrator` (`ProjectPipeline<RenderToPreviewAstRenderer>`) so orchestrator-only non-determinism (project discovery, ProjectIndex assembly, file-iteration order) is also under test. Website/chrome fixtures are orchestrator-only by design. Failure policy: failing fixtures stay **failing** — no auto-`#[ignore]`. Each failure files a beads issue whose description doubles as a sub-agent investigation prompt. The integration branch holds the queue; merge to main waits until drained or the user explicitly opts to ignore. New helper `find_first_divergence` (alongside the hashers) returns `DivergencePoint::{Block { index }, MetaKey { path }, None}` so the test driver's panic message — and therefore the sub-agent prompt — arrives with a concrete starting point instead of just "hash diverged." Orchestrator-mode `DocumentAst` extraction: researched the data flow; the typed AST is materialized inside `render_qmd_to_preview_ast` but discarded after JSON serialization. Plan recommends adding `pub ast: DocumentAst` to `PreviewAstOutput` and forwarding through `WasmPassTwoOutput`; alternatives (JSON re-parse, test-only hook) documented with their costs. Fixture rules: no absolute process paths in fixture content (built-in extensions extract to a `temp_dir` whose path differs across CI runs; stable within a single process — fine for two-runs-compare, but a latent issue for future stored-snapshot variants). Smaller corrections: `Format::from_format_string("q2-preview")` (no `Format::q2_preview()` constructor exists); `apply_lua_filter` (singular) is the per-filter Lua-state-creation site, with the plural loop calling it once per filter; `LuaShortcodeEngine::new` is the shortcode-side analogue; `quarto/video` filter extension is built-in via `include_dir!(resources/extensions)` and auto-discovered by `StageContext::new`, so fixtures need no scaffolding beyond `filters: [video]` in YAML; `meta.rendered.includes.*` is the actual path (not `meta.includes.*`) and includes contributions from `IncludeResolveStage`, chrome render transforms, `attribution_viewer`, and Bootstrap/clipboard injection — all skipped by `compute_meta_hash_fresh_excluding_rendered`. Stage-inventory clarifications: `MathJsStage` is excluded from q2-preview; `BootstrapJsStage` and `ClipboardJsStage` write only to `ctx.artifacts` (not to `meta` or `blocks`), so they don't affect the hash — but their q2-preview inclusion is questionable and is filed separately as bd-2ag1c. Notes for the next traversal: `CodeHighlightStage`'s native disk scan for user grammars is OS-order-dependent (not exercised today; fixtures don't supply user grammars); lipsum's module-load `math.randomseed(os.time())` is harmless on the non-random code path the fixture exercises but should be reverified if a future variant routes through `math.random`. Estimated scope: ~760 → ~980 lines.

…branch policy Audit pass against current source. Settles every open question that remained in the prior revision and corrects factual drift. Reuse over rebuild - `DriveMode::ProjectOrchestrator` now delegates to the existing `render_active_page_preview` helper at `crates/quarto-core/tests/render_page_in_project.rs:660`. No fresh orchestrator wiring; no `make_website_project_ctx(...)` builder. - `DocumentAst` extraction settled on option (a): re-parse the JSON via `pampa::readers::json::read`. source_info round-trips but the hash excludes it, so no stripping pass and no production plumbing change is required. Earlier option (b) (typed-AST plumbing through `PreviewAstOutput` / `WasmPassTwoOutput`) abandoned. - `run_orchestrator` code sample updated: real body in place of the prior `unimplemented!("see Open questions")` stub. Test crate location pinned - File: `crates/quarto-core/tests/idempotence.rs`. - Fixtures: `crates/quarto-core/tests/fixtures/idempotence/`. - Cargo invocation in the sub-agent prompt template updated to `--test idempotence`. Long-lived branch policy made explicit - New `## Long-lived branch policy` section at the top. - `## Goal` clarifies that "CI-enforced" applies when the plan lands on `main`; until then `feature/provenance` is allowed to be red while the failure queue drains. - `### Phase 5 — Failure triage` opens with the same constraint. Factual fixes against current source - Transform count corrected from 36 to 37; missing `table-bootstrap-class` added to Finalization, with a fixture entry in the gap audit and Phase 4 checklist. - `Q2_PREVIEW_STAGE_EXCLUDED` corrected to list all three exclusions (`math-js`, `render-html-body`, `apply-template`). - `CodeHighlightStage` user-grammar scan citation moved from `pipeline.rs:644-650` to `crates/quarto-core/src/transforms/code_highlight.rs:126-129`. - Stale line numbers refreshed throughout (pipeline.rs 1181→1198, 1220→1237, 379→380, 355→356, 626→627, 855→859, 663→664; render_page_in_project.rs 653→660; Pass2Payload::AstJson 256→254; stage/context.rs 220→221; ShortcodeResolveTransform::transform 257→513 with the correct file path). - bd-2ag1c ordering pinned: Plan 3 lands first; bd-2ag1c follows with Plan 3's measurements in hand. Section rename: "Open questions for implementation" → "Decisions (was: open questions)" + a `### CI failure policy & sub-agent prompt template` subsection. All internal cross-refs updated. Estimate revised - Scaffolding line item: ~260 → ~100 lines (reuse, not rebuild). - `PreviewAstOutput::ast` plumbing (~20 lines) removed entirely. - Total: ~980 → ~800 lines. - Session count revised 2 → 2-3 with the third explicitly allocated to Phase 5 triage.

Adds the structural-hash infrastructure that Plan 3's q2-preview idempotence gate (and Plan 7a's runtime user-filter check) will sit on: - compute_meta_hash_fresh: source-info-agnostic ConfigValue hasher. Insertion-order Map keys (no sort, so HashMap-iteration-order bugs in transforms remain detectable). MergeOp participates via its enum discriminant. Recurses into PandocInlines/PandocBlocks via the existing inline/block hashers (which already exclude source_info). - compute_meta_hash_fresh_excluding_rendered: same, but skips the top-level `rendered` map entry. The exclusion is intentionally not propagated into recursion: a nested `rendered` key is content. - find_first_divergence + DivergencePoint: returns the first block index whose per-block fresh hash differs, or the first insertion- order meta key path whose subtree hash differs (with the same rendered.* exclusion). The plan-sketch signature took &DocumentAst, but quarto-ast-reconcile cannot depend on quarto-core; the helper takes &[Block] + &ConfigValue and the test driver projects from DocumentAst. - 11 new unit tests cover: same/different content, source_info/ key_source agnosticism, top-level rendered exclusion, nested rendered participation, Map insertion-order sensitivity (no-sort regression guard), MergeOp sensitivity; identical/Block-mismatch/ MetaKey-path/rendered-skip divergence localization. Verification: `cargo nextest run --workspace` — 9321 passed, 196 skipped. `cargo xtask verify --skip-hub-build` steps 1–5 green (lint, fmt, Rust build with -D warnings, tree-sitter, Rust tests with -D warnings). Steps 7/10 fail with the known --skip-hub-build artifact (`wasm-quarto-hub-client` unbuilt), unrelated to these additive Rust changes. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

Adds the test driver that Phases 3-4 will hang ~25 fixtures off. Self-contained at `crates/quarto-core/tests/idempotence.rs`. - `DriveMode { SingleFile, ProjectOrchestrator }`. Single-file calls `run_pipeline` with `build_q2_preview_pipeline_stages`. Orchestrator drives `ProjectPipeline<RenderToPreviewAstRenderer>` via the existing `render_active_page_preview` body (copied inline because each `tests/*.rs` is its own binary). - `Fixture { name, setup, active, modes }` + `run_fixture` runs the pipeline twice per (fixture, mode), hashes blocks via `compute_blocks_hash_fresh` and meta via `compute_meta_hash_fresh_excluding_rendered`, and on divergence panics with `find_first_divergence`'s `DivergencePoint` embedded so the panic message itself fills the plan's sub-agent investigation prompt template. - `pandoc_to_document_ast` is the small field-shuffle that the plan identifies: orchestrator mode emits `Pass2Payload::AstJson`, which `pampa::readers::json::read` re-parses into `(Pandoc, ASTContext)`; the hasher only reads `ast.blocks` + `ast.meta` so the other `DocumentAst` fields get defaults. - `tests/fixtures/idempotence/README.md` documents the fixture-format rules (no engine cells, no absolute paths, per-fixture mode mapping). - `smoke_plain_paragraph` smoke fixture drives a single-paragraph document through both modes. Passing this proves the harness works end-to-end before Phases 3-4 land the real fixtures. Verification: `cargo nextest run -p quarto-core --test idempotence` runs the new smoke test (PASS). `cargo xtask verify --skip-hub-build --skip-hub-tests` steps 1-9 green; the Phase-1 idempotence tests and this Phase-2 smoke test ran inside Step 5. Step 10 (preview-renderer integration tests in `ts-packages/preview-renderer/`) fails with the same WASM-import artifact as Step 7 — both depend on `wasm-quarto-hub-client` which `--skip-hub-build` skips. Unrelated to these Rust-only additions. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

Adds the existing-fixture batch the plan calls "carry-forward from prior plan draft": one fixture per Rust transform / feature that was already exercised in earlier idempotence drafts, scoped to single-file document fixtures that run in both DriveMode variants. Coverage: - meta-single, meta-markdown — shortcode-resolve + metadata-normalize (string and PandocInlines branches). - include-trivial — include-expansion stage + shortcode-resolve. - callout-warning — CalloutTransform (callout-resolve is excluded from q2-preview, so the CustomNode survives). - theorem — TheoremSugarTransform. - figure-ref-target — FloatRefTargetSugarTransform. - crossref-to-theorem — crossref-index + crossref-resolve. - sectionize-multi — SectionizeTransform across nested headers. - footnotes-mixed — FootnotesTransform on inline + reference forms. - appendix-license — AppendixStructureTransform with license/ copyright meta and a footnote interaction. - combined-stress — sectionize + callouts + shortcodes interacting. A `doc_fixture(name, content)` helper collapses each single-file fixture to a one-liner; `include-trivial` keeps an inline closure because it writes two files. All 12 idempotence tests (smoke + 11 new) pass: `cargo nextest run -p quarto-core --test idempotence` → 12 passed. No queue entries for Phase 5 from this batch — the carry-forward fixtures are all clean on first run. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

npm install (from repo root) and npm run build:wasm (from hub-client) updated package-lock.json and crates/wasm-quarto-hub-client/Cargo.lock on this branch. Committed so subsequent fresh checkouts of feature/provenance can build WASM from the same dependency set.

Adds the batch of Phase-4 fixtures that need no scaffolding beyond a single-file `setup`. Per the long-lived-integration-branch policy, fixtures that surface non-idempotence stay in the suite as the triage queue. Pass on first run (both DriveModes): - code-block-fenced — code-block-generate / -render / code-highlight. - proof — ProofSugarTransform. - equation-labeled — EquationLabelTransform + crossref-resolve (eq). - toc-on — toc-generate, toc-render. - video-filter-header — built-in Lua filter under `resources/extensions/quarto/video/`. - theme-bootstrap — compile-theme-css stage. - table-bootstrap-class — TableBootstrapClassTransform. - lua-shortcode-version — Lua-loaded shortcode handler (returns `quarto.version`). In the queue: - **lua-shortcode-lipsum-fixed**: `SingleFile` passes; the pipeline itself is idempotent. `ProjectOrchestrator` panics with `MalformedSourceInfoPool` re-parsing the AST JSON the orchestrator emitted. This is a JSON writer/reader round-trip bug specific to lipsum-shortcode-generated inlines, not a transform-determinism finding. Filed as **bd-3odjm**. The test stays red per the plan's "do not #[ignore]" rule; the integration branch is allowed to carry the failure until the queue is drained. Verification: `cargo nextest run -p quarto-core --test idempotence` → 20 passed, 1 failed (bd-3odjm). Plan-1 unit tests and Phase-3 fixtures all green. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm

Both pass on first run in both DriveMode variants. - include-in-header writes a tiny header.html and references it from front matter; exercises IncludeResolveStage. - resource-image writes a 67-byte minimal PNG and references it via inline image syntax; exercises ResourceCollectorTransform. Adds a write_bytes helper for the binary stub. Per the fixtures README rule the PNG sits at the project root and is referenced relatively (`./local.png`). Verification: `cargo nextest run -p quarto-core --test idempotence` → 22 passed, 1 failed (bd-3odjm).

Three orchestrator-only website fixtures. Two pass, one in queue. Pass: - website-chrome — navbar + sidebar + page-navigation + page-footer + favicon + bootstrap-icons + canonical-url + title-prefix. Two pages (index, other), tiny favicon stub. - website-listing — listing with categories enabled and feed: true, two posts under posts/, each with categories. Exercises listing-generate / -render, categories-sidebar, listing-feed-link, listing-feed-stage, listing-item-info. In the queue: - website-links — internal cross-page `.qmd` body links. Filed as bd-rz2we. Block 0 hash diverges across runs while meta hash is stable, so the divergence is genuinely in the AST blocks (not in rendered chrome). Hypothesis: link-rewrite or link-resolution is capturing the absolute project root (or canonicalized tempdir path) into the AST when it should emit a path-independent relative URL. Verification: `cargo nextest run -p quarto-core --test idempotence` → 24 passed, 2 failed (bd-3odjm, bd-rz2we). Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-rz2we

Extends Fixture with an optional attribution_json: Option<&'static str>. When present: - SingleFile installs PreBuiltAttributionProvider on RenderContext.attribution_provider before run_pipeline. - ProjectOrchestrator forwards the JSON via RenderToPreviewAstRenderer::with_attribution; the renderer installs the same provider type on the per-page RenderContext it constructs internally. Stub JSON has one actor + one run covering bytes 0..1024 (a wider range than the fixture body actually uses) so the attribution map overlaps the entire document and AttributionGenerateStage + AttributionRenderTransform have something to write into the AST. `cargo nextest run -p quarto-core --test idempotence` → 25 passed, 2 failed (bd-3odjm, bd-rz2we — both pre-existing). attribution_basic passes on first run in both DriveModes, so the deterministic provider + generate + render stack is genuinely idempotent. This completes the Phase 4 fixture set. The Plan-3 gate now covers: - 1 smoke fixture - 11 carry-forward (Phase 3, all green) - 9 Phase-4a doc fixtures (8 green, 1 in queue) - 2 Phase-4b multi-file (both green) - 3 Phase-4c website (2 green, 1 in queue) - 1 Phase-4d attribution (green) Total: 27 fixtures, 25 green, 2 in queue. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm (Plan 5 will fix), bd-rz2we

Adds claude-notes/instructions/idempotence-contract.md — the author-facing summary of the contract Plan 3 enforces. Covers: - what the hash includes and excludes (source-info blind, insertion-order maps, merge_op participates, rendered.* excluded at top level only); - what new transforms must NOT do (undefined iteration order, process-local state, absolute paths, engine cells); - the fresh-Lua-state-per-run rule for Lua filters / shortcodes; - how to add a fixture (doc_fixture for trivial, inline closure for multi-file, ORCHESTRATOR_ONLY for chrome, attribution_json for attribution exercises); - the long-lived-integration-branch policy: don't #[ignore] a failing fixture without explicit user approval. Cross-linked from: - crates/quarto-core/tests/fixtures/idempotence/README.md (existing pointer expanded to point at the contract doc and the plan). - claude-notes/plans/2026-05-04-q2-preview-plan-7a-user-filter-idempotence.md (References section — authors looking at the runtime user-filter check find the CI contract too). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

cargo nextest run --workspace: 9346/9348 pass. The 2 failures are the documented queue items (bd-3odjm, bd-rz2we); every other workspace test is green, including the 25 passing idempotence fixtures. cargo xtask verify (full WASM stack): Steps 1-4 green; Step 5 fails on the same 2 fixtures. That's the expected long-lived- integration-branch state per the plan's §Long-lived branch policy — the gate is allowed to be red until the queue is drained. Plan 3 is complete as a deliverable: gate + hashing infrastructure + 27 fixtures + author-facing docs + filed queue. Merge to main gated on draining the queue (bd-3odjm via Plan 5; bd-rz2we via a follow-up). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

The Work-items section under Phase 1-7 was fully checked, but the parallel "Coverage gaps to address during implementation" inventory (per-fixture bullets, line ~560+) still showed unchecked boxes even though every fixture in that list now ships in idempotence.rs. Marked all 26 inventory items as landed. Annotated the two that are in the Phase-5 triage queue (lipsum-fixed → bd-3odjm, website-links → bd-rz2we) so the queue state is also visible from the inventory, not just from the Phase-5 work-items block. Plan checklist is now fully consistent: 54 checked, 0 unchecked.

…erContext Plan 3's website_links fixture was non-idempotent: rendered AST link URLs captured the absolute tempdir path of the per-run TempDir, causing block-0 hash divergence across two runs with different tempdirs. Root cause: `ResourceResolverContext::vfs_root_mode` played two roles via a single PathBuf — disk-write root (where runtime.file_write puts theme CSS / copied resources) and URL prefix (what gets embedded in HTML link/asset URLs). In production WASM these are intentionally identical; on native they have to diverge so writes hit a real tempdir but URLs stay path-independent. Split the field into `{ write_root, url_root }` and add a two-arg `vfs_root_with_url_root` constructor plus per-renderer `with_url_root` builder. Single-arg `vfs_root(...)` constructor preserves the WASM identity contract by construction (write_root == url_root). Native test helpers in tests/idempotence.rs and tests/render_page_in_project.rs now pass `.with_url_root("/.quarto/project-artifacts")`, so rendered URLs embed the synthetic prefix while disk writes still land in the tempdir. website_links now passes; 25/26 idempotence fixtures pass. The remaining lipsum failure is bd-3odjm (FilterProvenance wire format), owned by Plan 5 and out of scope here. Workspace nextest: 9347/9348. cargo xtask verify (Rust leg) clean for lint/fmt/build with -D warnings. Plan: claude-notes/plans/2026-05-21-vfs-url-write-root-split.md

Plan 4 (SourceInfo provenance types) finalized for development: - 7-phase work-items checklist (types → constructors → accessor updates → Lua serde → migration → tests → verification gate) - field renamed `anchors` → `from` (typed `SmallVec<[Anchor; 1]>` from day 1; serde feature required on smallvec) - accessor semantics for `Generated` pinned: length/start_offset/ end_offset → 0, map_offset → None, resolve_byte_range / remap_file_ids / extract_file_id delegate to invocation_anchor - required-Invocation-anchor invariant on `shortcode` kind documented with `By::shortcode` doc-comment requirement; enforcement split across Plan 6 audit test and Plan 7 debug_assert - Lua-table discriminant pinned to `t = "Generated"` - §Test plan and Phase 6 expanded to cover every accessor + mutator + the `combine()` × Generated corner - migration scope corrected (15 files, 27 occurrences); references and line ranges verified against the worktree source - §Open questions section removed (no open questions remain) Cross-plan `from` rename swept across Plans 3, 5, 6, 7, 8. Plan 5 JSON wire format (option D): - outer JSON key `anchors` → `from` (matches Rust field name) - inner anchor pool reference `from` → `si_id` (distinctive; avoids the `parent_id` tree-structure mental model that fits Substring's chain but not anchor references) - Reader/writer code samples updated; TS-side `SourceInfoEntry` shape note updated Plan 6 + Plan 7 hand-offs for the required-anchor invariant added. Deferred follow-ups (Dispatch anchor, ValueSource anchor) cross- referenced as bd-36fr9 and bd-129m3 (committed separately to main).

Plan 4 work happens on top of an integration branch carrying exactly one failing test (lua_shortcode_lipsum_fixed orchestrator mode, filed as bd-3odjm). That test's root cause is the wire-format code-3 collision Plan 5 owns, so Plan 4 must not try to fix it locally. Plan 4: - New §"Inherited pre-existing failure (bd-3odjm)" section between Out of scope and Work items. Explains the test, the panic shape, the root cause, and that any *other* failure in the idempotence suite is a Plan-4 regression. - Phase 7 verification gate updated: cargo nextest expects exactly one failure (bd-3odjm); cargo xtask verify trips on the same one. Plan 5: - New §"Inherited failure that must close on Plan 5's first reader change (bd-3odjm)" section. Spells out the contract: Plan 5's first reader change must turn lua_shortcode_lipsum_fixed green. If it doesn't, the Plan-5 author has an immediate signal that either the reader discrimination is wrong or the lipsum path produces a code-3 shape neither arm handles — stop and focus on it before moving on. - Test plan now cites bd-3odjm as the live first-iteration smoke check, ahead of the hand-constructed tests. Both plans now read consistently with the state of feature/provenance.

Plan 4 committed `from: SmallVec<[Anchor; 1]>` as the field type, but Plan 5's reader/writer + Plan 6's stamper code samples still used the `vec![]` macro to construct it. Those samples would not compile if taken literally — `vec!` produces a `Vec`, not a `SmallVec`. Switch to `smallvec![]` everywhere `Generated.from` is constructed: - Plan 5: 4 occurrences (legacy-Transformed code-3 reader; Anchor dedup test description; forward-compat test description; round- trip test description). - Plan 6: 14 occurrences across §"Per-transform fixes", §"Lua-shortcode enrichment", §"The post-walk helper", §"Variant semantics summary" etc. No semantic change — same constructions, just the macro that actually returns the field type.

Plan 4 + Plan 5: change Generated.from's inline capacity from SmallVec<[Anchor; 1]> to SmallVec<[Anchor; 2]> so the steady-state post-follow-up shape (Invocation + ValueSource on meta/var; Invocation + Dispatch on Lua-handler shortcodes) stays heap-free. Cost is +16 bytes per empty Generated; saves a heap allocation on every multi-anchor shortcode resolution. Also folds in research findings that were tacit in the previous draft: - Phase 1 smallvec line: replace "or verify present" hedge with the concrete two-file Cargo.toml edit (workspace + quarto-source-map), noting verified-absent. - skip_serializing_if path: use the fully-qualified serde_json::Value::is_null (the short form is a frequent gotcha). - By::raw policy: accept-all; forgery caught by Plan 6 audit + Plan 7 debug_assert, not by constructor rejection. - Anchor ordering: append order, stable across serde, at most one anchor per known role. - extract_file_id: empty-from Generated returns None, matching FilterProvenance's behavior; both call sites in to_ariadne_report already tolerate None. Stays a private fn on DiagnosticMessage. - Lua serde Concat recursion: legacy "FilterProvenance" inside a Concat piece is handled automatically; no .snap/.json fixtures contain the legacy tag. - Default risk: no struct holding SourceInfo derives Default in quarto-pandoc-types; Default for SourceInfo itself stays unchanged. - combine() × Generated: verified unreachable today (all 17 call sites combine Original/Substring shapes); the Phase 6 test documents intent for any future caller. - PartialEq: no production call site compares SourceInfo today; the derive is required by Block/Inline but not load-bearing.

The previous "+16 bytes per Generated" note understated the cost by ~2.5x. Actual delta: - Anchor = AnchorRole (32 bytes — String-bearing Other variant dominates) + Arc<SourceInfo> (8) = ~40 bytes. - SmallVec<[Anchor; 1]> ≈ 48 bytes; SmallVec<[Anchor; 2]> ≈ 88 bytes on the stack — a 40-byte delta per SmallVec field. - Since SourceInfo is an enum, its stack size is dictated by the largest variant, so every SourceInfo (Original/Substring/Concat too) grows by 40 bytes — not just Generated instances. Block/Inline carry SourceInfo by value, so the cost multiplies across the AST (tens-to-hundreds of KB on a large doc). Plan keeps cap=2 — the trade is still defensible — but documents the real cost honestly and notes Arc-boxing Generated as the next lever if memory-per-node ever bites the q2-preview editor.

…::unknown Drop the Pandoc-flavored naming. q2 isn't pandoc-centric and the affected call sites aren't all Pandoc (CLI stdin, Lua handoff, external filter binaries). Renames: - json::read_strict + json::read_lenient -> json::read (strict) + json::read_completing_source_info (the new lenient variant). The function name matches the surrounding read_<thing> convention in readers/json.rs (read_inline, read_block, read_attr_source, make_source_info). Says exactly what it does. - By::external_pandoc -> By::unknown. Honest about what we know ("we don't know"), generic enough to cover all four outside-world call sites (qmd-syntax-helper, CLI stdin, external filter, Lua handoff). Pool-slot constants chained via + 1 in writers/json.rs so future reserved slots don't require hardcoded number changes: pub const USER_EDIT_SOURCE_INFO_ID: usize = 0; pub const UNKNOWN_SOURCE_INFO_ID: usize = USER_EDIT_SOURCE_INFO_ID + 1; SourceInfoSerializer::new() pre-pushes the slots in declaration order; a unit test next to the constants asserts the pool entries match, so adding or rearranging slots fails the test rather than silently shifting IDs at consumer sites. The TS hand-mirror follows the same pattern with a Rust-side CI parity test. Provenance-contract.md §2 catalog: drop external_pandoc row, add unknown row noting it's the source_info-completing reader's placeholder. Co-authored-by: Claude <noreply@anthropic.com>

Three coordinated changes to the design doc: 1. Define "authored content" upfront, before the BP formal statement. Replaces "node-local content" everywhere. The new term carries both the structural aspect (excludes descendants' bytes) and the semantic aspect (producer-contract attests user authorship). Pipeline- generated nodes have no authored content by definition; the dispatch routes them to non-emitting rules. (P2) now reads cleanly: "the byte was produced by serializing the authored content of a single AST node n." Reader doesn't have to infer the user-authorship scope from the dispatch table. 2. Add the Completeness section as a dual to Soundness. Four clauses partition every byte: (C1) Preserved - Source bytes still claimed by AST_new appear. (C2) Authored - non-soft-drop nodes' authored content appears. (R) Refused - soft-drop sites refuse authored content + warn. (D) Deleted - bytes no longer claimed don't appear. C-prefix denotes positive completeness (appears in Source'); R/D denote negative cases (doesn't appear). (C1)/(C2) dual (P1)/(P2). "Soft-drop site" defined precisely as "UseAfter or RecurseIntoContainer AND editability gate returns not-editable." R5-special (let-user-win) is explicitly NOT a soft-drop site; it falls under (C2). Proof by structural induction over R1, R1', R2, R2', R5, R3/R4 cases. 3. Rename "What BP does not promise" to "What BP and Completeness do not promise". Reclassify the marker-fidelity / lazy-numbering / block-container shell-regeneration gaps as a single unified completeness gap: helper-emitted bytes don't preserve user-specific syntactic choices. Soundness still holds (helper output is honest authored content via P2); completeness fails for byte-level fidelity of the original syntactic form. Producer-hygiene caveat updated to note both invariants depend on it. Plan 7d Phase 4 gains a companion property test: completeness_holds (parse(Source') structurally equivalent to AST_new for non-soft-drop inputs), alongside the existing bp_holds (no atomic-Generated bytes leak). The two properties pin both invariants empirically. Co-authored-by: Claude <noreply@anthropic.com>

Property tests verify every input satisfies the property, but say nothing about which dispatch rules the generator actually exercises. Without coverage assertions, a generator subtly biased toward easy cases (mostly R1, rarely R5-special) gives a false sense of confidence. Add thread-local DispatchCounters in plan_user_writes, gated behind a dispatch-coverage build feature (zero cost in production). Each dispatch row ticks per visit. Property tests assert per-row minimum coverage after proptest completes; under-exercised rows fail with a specific message naming the row. Tuned thresholds: R1 >= 100 (most common; preserved content) R1' (soft-drop) >= 50 (atomic-Generated edit refusal) R2 / R2' >= 20 (omit / soft-omit) R3-helper >= 50 (new container with helper shells) R3-transparent >= 50 (sectionize wrapper recursion) R4 >= 30 (inline container preserved shells) R5 >= 50 (leaf serialization) R5-special >= 20 (let-user-win atomic CustomNode replace) Keeps the generator honest as the writer evolves: future contributors adding a dispatch row must add a corresponding threshold; a future change that accidentally makes a row unreachable surfaces as a coverage failure rather than passing tests. Co-authored-by: Claude <noreply@anthropic.com>

Add a framing sub-section at the top of Phase 4 that ties the four testing pieces together as a coordinated strategy: - Generator (gen_pandoc_with_atomic_descendants) — produces ASTs with atomic-Generated descendants at varying depths plus user-edits. Extends the existing quarto-ast-reconcile generators with two new capabilities: atomic-injection at configurable density, and realistic user-edit transformations. - Marker-string convention for soundness (bp_holds) — fresh recognizable marker per iteration injected into atomic-Generated content; one-line assertion that it doesn't appear in Source'. - Structural-equivalence reuse for completeness (completeness_holds) — reuses quarto_ast_reconcile::hash::compute_block_hash (already source-info-blind per hash.rs:498), which absorbs helper- canonicalization gaps at the AST level without bespoke matchers. - Required dispatch-coverage instrumentation — the full spec stays in the work items below; the intro names it as the fourth coordinated piece. Closes the loose thread from the conversation: I had offered to write this sub-section but only landed the coverage-counter piece. The four pieces fit together; the intro makes the fit explicit so a future implementer reading Phase 4 understands the strategy before the work items. Co-authored-by: Claude <noreply@anthropic.com>

Item 1 (Phase 4 — pool intern dedup): The serializer's intern cache is strict Arc-pointer equality at parent edges only; it never dedups top-level intern calls by value. Round-tripped completing-reader nodes will get fresh pool entries structurally equal to the reserved slots. Decision: accept the duplication (option a). Bounded, per-document, cosmetic. Add a one-line comment near intern marking it intentional. Item 2 (Phase 4 — per-caller reader-split verification): All five outside-world callers consume source_info downstream, so the placeholder choice matters. json_filter.rs gets By::filter(filter_path, 0); the other four get By::unknown(). Signature change: read_completing_source_info should accept default_by: By rather than baking unknown in, so callers declare their provenance up front. Flag: qmd-syntax-helper's qmd::write calls shift dispatch from R1-empty to R5-synthesize — the new behavior is correct. Item 3 (Phase 6.5 — reconciler "synthesis sites"): The line numbers in the earlier draft pointed to test code (AttrSourceInfo::empty field assignments in #[cfg(test)] blocks), not InlineAttr::new calls. The three real production InlineAttr::new sites live in pampa's tree-sitter lowering and pass non-empty attr_source; they need explicit source_info wired through from the surrounding parse range. By::reconcile_synthesize becomes a forward-looking primitive; no producer uses it at 7f-landing. Item 4 (Phase 1 — renderCustomNodeChildren): Verified preserves s: via { ...customNode, slots: ... } spread at dispatch.tsx:274. Both CustomBlock and CustomInline reach the same path. Move both from "needs verification" to "preserves." Open questions for review: - By::filter atomic-kind concern for external filter output (item 2 table). - Whether read_completing_source_info reuses UNKNOWN_SOURCE_INFO_ID when default_by == By::unknown() or always allocates fresh (recommend fresh for uniform path).

…_synthesize, expand 6.5 scope Decisions locked in (2026-05-30 conversation): - Keep USER_EDIT_SOURCE_INFO_ID = 0 magic number (framework can't allocate into the Rust pool; the slot ID must be agreed in advance). - Drop UNKNOWN_SOURCE_INFO_ID and the second reserved slot. The completing reader takes `default_by: By` and allocates a fresh pool entry on every fill. No hand-mirror, no parity test for slot 1, no special case for `default_by == By::unknown()`. - Drop By::reconcile_synthesize entirely — no producer uses it at 7f-landing. - Add By::is_programmatic_sentinel() predicate covering config-default, programmatic-config, unknown. Replaces the navigation_href.rs equality check against SourceInfo::default(). No is_default() function needed. - By::unknown is non-atomic. By::filter is atomic and the right semantic for json_filter.rs (filter-added nodes shouldn't be source-editable). Phase 3 walker fix: The previous walker used a 't' in value heuristic to recurse into CustomNode slots, which would have misread the Slot wrapper ({ kind, value }) as a non-AST object and silently failed to stamp anything inside slots. Rewritten to dispatch on slot.kind per the actual TS Slot discriminated union at ts-packages/preview-renderer/src/framework/types.ts:123-128. Phase 6.5 expansion: Audit found additional production SourceInfo::default sites the plan missed: - config_value.rs:822, 826 (insert_path intermediates) → By::programmatic_config - project_resources.rs:541 (canonicalize_within_project sentinel) → By::unknown - navigation_href.rs:382 (equality check) → is_programmatic_sentinel pattern SchemaError::InvalidStructure scope corrected: 4 None sentinel sites (merge.rs:32/51/88, mod.rs:250), ~11 Some(value.source_info.clone()) sites in helpers.rs, plus a formatter at error.rs:33-46. Plan previously claimed "four call sites" — undercounted by 3×. Mechanical fixes: - InlineAttr::new line numbers 304-311 → 333-348 (the actual location). - JsonReadError line numbers 23/30 → 25/31. - writers/json.rs s:-bearing struct range 1010-1116 → 1068-1195. - Phase 7 deprecated Default impl: file: FileId(0) → file_id: FileId(0). - Phase 5: clarify the "remove the camelCase fallback" wording (no real fallback exists; the per-field rename overrides the macro). - ATOMIC_CUSTOM_NODES Rust + TS paths spelled out for the parity test. - attr.rs:45-46 stale doc-comment (claims SourceInfo::default fallback; real consumers fall back to None) noted for cleanup.

…lan 7d trust-point gate Decisions locked in (2026-06-01): - PandocNativeIntermediate::IntermediateAttr widens to carry SourceInfo alongside (Attr, AttrSourceInfo). Cleaner provenance than chasing source_info through three uneven call paths; one producer-side update versus three consumer-side refactors. - q2-debug uses the framework's <Node>, so Phase 3 stampUserEdits comes for free. Only one q2-debug-local renderer (Figure at components.tsx:110) needs the Phase 2 spread-fix. - Plan 7d's R5 trust point is enforced by `-D deprecated`. After Phase 7 lands the deprecation, denying it in CI turns every remaining SourceInfo::default() caller into a compile error. The compiler is the audit; no separate residue grep step needed. Audit results (four background agents, 2026-06-01): 1. Cross-crate residue: green. The 447 quarto-core SourceInfo::default hits dramatically overstate exposure. Actual production residue beyond Phase 6.5's list: citeproc/output.rs:1274, quarto-config/materialize.rs:132/152/165, quarto-core/project/listing/feed/stage.rs:596/602. All added to Phase 6.5 work items. 2. derive(Default) on SourceInfo-bearing structs: false alarm. None of the five candidate structs actually contain a SourceInfo field. The deprecation won't fire on them. Phase 8 audit step downgraded to a no-op note. 3. ConfigValue::default semantics shift: safe. Only 2 production callers (include_expansion.rs:203,238); both construct a transient Pandoc wrapper and discard the .meta field without reading it. Migration sound. 4. Snapshot churn: 62 .snap files in crates/pampa/snapshots/json/ (one directory). Other 167 snapshots unaffected. Phase 6's dispatch shift expected to produce zero snapshot diffs (the harness uses real-parsed AST, not defaults). Commit-split recommended: Phase 5 renames first, then Phase 4 pool-shift, then Phase 6 (expect no snap diffs). Plan now reads end-to-end with bounded scope and a compile-time enforcement mechanism. Ready for implementation.

… for rebase Prepares feature/provenance for rebase onto origin/main, which landed the integration-test consolidation (#239 / bd-xvdop): every crate now has a single `tests/integration/<name>.rs` + `tests/integration/main.rs` binary instead of one binary per `tests/<name>.rs`. `idempotence.rs` is the only test file on this branch that is NEW (no counterpart on main), so a straight rebase would land it in the deprecated old layout with zero conflict and zero signal — silently reintroducing the per-file-binary bloat #239 removed, caught by no lint or compile error. Move it into the new layout now, as an explicit, reviewable, build-verified commit, so the placement is a verified fact before the 85-commit replay rather than a post-rebase chore: - git mv tests/idempotence.rs -> tests/integration/idempotence.rs - add tests/integration/main.rs registering `pub mod idempotence;` On rebase, the new tests/integration/main.rs will collide (add/add) with main's version (~34 modules); resolution is a trivial union (keep main's list + idempotence). That loud conflict is the point — it can't be missed. Verified on this branch (pre-rebase): integration binary compiles; all 27 idempotence tests pass under `binary(integration)`. The genuinely-renamed test files (incremental_writer_tests.rs et al.) are left for rebase rename-detection to follow + a post-rebase structural check.

Earlier note implied 7b might use hand-crafted JSON that would need the strict-reader pattern. After reading 7b in full: it's qmd-focused test coverage that constructs ASTs directly in Rust and exercises the qmd writer (`incremental_write`, `compute_blocks_hash_fresh`). No JSON reads or wire-format assertions. 7b ships after 7f. The interaction is API-surface-only — 7b's authors write against the post-7f APIs from the start (for_test, 3-arg InlineAttr::new, widened IntermediateAttr). No rebase work needed.

…_INFO_ID Plan 7f's 2026-05-30 research findings dropped two earlier-draft items: - `By::reconcile_synthesize()` — no producer uses it at 7f-landing time; remove from the By:: catalog. Add back later if a reconciler path appears that synthesizes new AST without an input SourceInfo to inherit from. - `UNKNOWN_SOURCE_INFO_ID` reserved pool slot — the completing reader takes a `default_by: By` parameter and allocates a fresh pool entry per missing `s:`, so there's no slot 1. Rewrite the `By::unknown()` row to describe the actual mechanism. Brings provenance-contract.md back in sync with the plan; pre-Phase-1 cleanup so the catalog matches what ships.

Wrap-rebuild renderers in `dispatch.tsx` and the q2-debug `Figure` renderer were emitting a fresh `{ t: '<Tag>', c: newChildren }` object on every child edit, dropping `s:` (and every other top-level field) from the rebuilt parent. After Phase 2: - 19 stripping renderers (Emph/Strong, the five flat inline wrappers via `makeFlatInlineRenderer`, Link/Image/Span/Quoted, Para/Plain/Header/BlockQuote/Div, BulletList/OrderedList/Figure) now rebuild via `{ ...node, c: ... }`. - q2-debug's local Figure renderer at `hub-client/src/components/render/q2-debug/components.tsx:110` gets the same spread treatment. - `dispatch.test.tsx` covers all 22 entries in the `renderChildrenRegistry`: 19 that previously failed and the 3 that already preserved (`Ast`, `CustomBlock`, `CustomInline`). Preserving `s:` is a precondition for the strict JSON reader landing in Plan 7f Phase 4. Without it, every child edit rebuilds an ancestor with no source_info reference, which the strict reader would reject.

…igure s: preservation)

…7f Phase 3) Wrap `<Node>`'s `setLocalAst` so every AST a user-edit affordance hands up the chain has `s:` populated on every node. The walker: - Stamps `s: USER_EDIT_SOURCE_INFO_ID` (slot 0) on any node lacking `s:`. - Leaves preserved nodes (those with existing `s:`) untouched, so the Phase 2 rebuilt-wrapper path keeps the original parent's source_info. - Recurses into `c:` (standard wrapper shape) and `slots:` (CustomNode shape, dispatched on `slot.kind`). - Walks nested arrays inside `c:` so Header / Link / BulletList shapes stamp their inner inline arrays correctly. Tagged-marker values (`{t: 'DisplayMath'}`, `{t: 'SingleQuote'}`) get a spurious `s:` field; serde-tag-based reads ignore it (markers are deserialized by tag, not by struct), so this is harmless. The atomic-gate noop path skips stamping — wasted work when the edit is dropped anyway. Stamping is per-node idempotent; outer-level rewalking of a stamped subtree is a no-op. `USER_EDIT_SOURCE_INFO_ID = 0` lands in `ts-packages/preview-renderer/src/types/sourceInfo.ts` here, ahead of Plan 7f Phase 4's Rust counterpart + hand-mirror parity test. Three plan-mandated tests + four robustness tests in `stampUserEdits.test.ts`: fresh Span stamping, rebuilt-wrapper preservation, splice-in (new + preserved siblings), CustomBlock slot recursion, `block`/`inline` single-value slot recursion, nested-array walks (Header c[2], BulletList items), idempotence.

…an 7f Phase 4) `By::unknown()` is the placeholder kind for nodes deserialized through `json::read_completing_source_info` when the upstream producer doesn't populate `s:` — qmd-syntax-helper's Pandoc subprocess output, CLI `--from json`, Lua AST handoff. Non-atomic by design: nodes carrying this kind remain editable in the preview; user edits re-stamp them as `user_edit` on save. Extends `test_by_is_atomic_kind` to assert non-atomicity, and adds a `test_by_unknown_constructor` that pins `kind == "unknown"` + null `data`. Phase 6.5's `is_programmatic_sentinel()` predicate will recognize this kind alongside `config-default` and `programmatic-config`.

…se 4) Splits the JSON reader's leniency into two named entry points: - `json::read` becomes strict — nodes missing their `s:` reference fail with `JsonReadError::MissingSourceInfoRef { node_path }`. The node_path is best-effort (tag name + parent context); good enough for a debugger to find the responsible producer site without the plumbing cost of a precise JSON-pointer. - `json::read_completing_source_info(reader, default_by: By)` fills missing `s:` with `Generated { by: default_by, from: [] }` in-place per node (no pool entries allocated on read — the writer creates the pool ID on re-serialize). Used by every site that consumes JSON from outside q2's source-tracking world. Five outside-world callers switched per the plan's per-caller table: - `json_filter.rs` → `By::filter(filter_path, 0)` (atomic-kind for filter-added nodes; pass-through nodes keep their original `s:`). - `qmd-syntax-helper/{definition_lists,grid_tables}.rs` → `By::unknown()`. Writer dispatch for these nodes shifts from R1-empty to R5-synthesize, which is the correct round-trip behavior. - `pampa/src/main.rs` (CLI `--from json`) → `By::unknown()`. - `pampa/src/lua/readwrite.rs` (Lua `pandoc.read(_, "json")`) → `By::unknown()`. The strict reader catches two real writer bugs that previously round-tripped silently through `SourceInfo::default()`: 1. `write_custom_block` and `stream_write_custom_block` synthesized `Plain`/`Div` wrappers for slot encoding without `s:`. Same shape in `write_custom_inline` / `stream_write_custom_inline` for the `Span` wrapper and the `[block content]` placeholder Str. All now inherit the parent CustomNode's `s_id`. 2. `Figure` did not emit `captionS` (Table did). Strict reader rejected Figure captions; added `captionS` to both the buffered and streaming Figure writers, and updated the Figure reader to consume it. Same shape as Table's `captionS`. Tests: - `json_reader_smoke_tests.rs` reads Pandoc-format fixtures under `tests/readers/json/` — switched to `read_completing_source_info`. - `test_json_div_transforms.rs` mimics `--from json` with hand-crafted pampa JSON — switched to match `main.rs`. - Full pampa suite (3903 tests) + workspace suite (9727 tests) green. Required adding `quarto-source-map` as a direct dep of `qmd-syntax-helper` (previously transitive through `pampa`).

…ool`→`p` (Plan 7f Phase 5) Phase 5 of Plan 7f compacts two top-level JSON keys to match the rest of the wire format's single-character convention. Writer (`crates/pampa/src/writers/json.rs`): * `#[serde(rename = "a")]` on `NodeWithAttrJson::attr_s`; field-order invariant preserved (a, c, s, t still alphabetic). * `#[serde(rename = "p")]` on `AstContextJson::source_info_pool`; alphabetic order under `astContext` preserved (files, metaTopLevelKeySources, p). * All 24 literal `"attrS"` keys and the 1 `"sourceInfoPool"` key in object-construction sites updated; doc comments + the Figure inline order-comment rewritten for the new alphabet (a, c, captionS, s, t). Reader (`crates/pampa/src/readers/json.rs`): symmetric reads of the new keys (14 sites + 1 pool key); error variant messages and the deserializer doc-block now reference `p` and `a` while keeping the human-readable name "source-info pool". TS: * `ts-packages/pandoc-types/src/types.ts` — 11 `attrS:` interface fields → `a:`; `RustQmdJson.astContext.sourceInfoPool` → `p`. * `ts-packages/preview-renderer/src/types/sourceInfo.ts` & `framework/Ast.tsx` — `AstContext.p` is the wire-format key; the internal React-context field stays `sourceInfoPool` for readability. * `ts-packages/annotated-qmd/src/{index.ts,block-converter.ts, inline-converter.ts}` — wire-format accesses (`block.attrS`/`inline.attrS`/`headS.attrS`/etc. → `.a`; `json.astContext.sourceInfoPool` → `.p`); internal parameter `attrS` renamed to `attrSource` for clarity. Tests, README, `debug-figure.js`, and `check_mismatches.py` follow. Audit (2026-06-01) confirmed `hub-client/`, `q2-preview-spa/`, and `crates/hub/` don't pattern-match on these keys — they delegate to the TS type packages. Snapshot regeneration: * 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated via `INSTA_UPDATE=always cargo nextest run -p pampa`. Diff is pure key rename (`"attrS":`→`"a":`, `"sourceInfoPool":`→`"p":`) plus a refreshed snapshot-source header reflecting the post- bd-xvdop integration-tests layout (`tests/test.rs` → `tests/integration/test.rs`). Commit sequencing: Phase 5 renames land first; Phase 4's pool-slot-0 commit follows and regenerates the same 62 files for the +1 ID shift. Example-fixture regeneration: * 20 `ts-packages/annotated-qmd/examples/*.json` + the `test/fixtures/math-with-attr.json` rebuilt by running `cargo run --bin pampa -- -t json -i <each>.qmd`. Committed fixtures dated to 2025-10-24 (commit 2b2337b) and were stale against multiple unrelated pampa releases; regeneration is required for the TS code (which now reads `a`/`p`) to find any data at all. Docs: * `claude-notes/designs/provenance-contract.md` — wire-format key references updated to `astContext.p`. * `claude-notes/instructions/performance-profiling.md` — Python canonicalize snippet uses `astContext["p"]`. * Historical plans/research notes intentionally retain `attrS` / `sourceInfoPool` since they describe state-as-of-then. Verification: * `cargo nextest run --workspace` → 9727 pass. * `cargo xtask verify` (full hub-build leg) → all 12 steps green including WASM rebuild + q2-preview-spa bundle. * `hub-client` unit tests → 82/82 pass. * `preview-renderer` tests → 205/205 pass. Known side-issue (not blocking): `annotated-qmd` shows 2/156 test failures — pre-existing pampa source-tracking off-by-one (inline code + div key-source spans capture a preceding whitespace byte). Filed as `bd-1d6io` with suspected-cause investigation pointing at commit `38e889ad` (2026-05-24, multi-line inline-code-span tokenization rework). Phase 5 only renamed JSON keys; no offset computation was touched. Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md

…se 4 pool-slot) The React framework's `stampUserEdits` walker (Plan 7f Phase 3) stamps `s: USER_EDIT_SOURCE_INFO_ID` on every AST node a `setLocalAst` call introduces without an existing `s:`. Until now the constant existed only on the TS side (added in commit `7ac9f445`); the Rust writer never pre-populated slot 0, so the stamp resolved to whatever happened to be interned first in each document. Most stamps landed on benign `Original{0..0}` entries, but the semantic was wrong — `s:0` should *mean* "this came from a user edit", not "this was the first thing interned." This commit makes the round-trip honest. Writer (`crates/pampa/src/writers/json.rs`): * `pub const USER_EDIT_SOURCE_INFO_ID: usize = 0;` defined alongside `SourceInfoSerializer`. Docstring chains future reserved slots via `+ 1` and points at the TS hand-mirror. * `SourceInfoSerializer::new()` now pre-pushes a `Generated{by: By::user_edit(), from: vec![]}` entry at index 0. The slot exists in every JSON document the writer produces regardless of whether any node references it. * The 9 writer-side unit tests that asserted `pool.len() == N` after N interns now express N as `USER_EDIT_SOURCE_INFO_ID + 1 + N` (using `let first_user_id = USER_EDIT_SOURCE_INFO_ID + 1;` locally) so a future second reserved slot doesn't silently break call sites. * New `test_reserved_slot_user_edit` pins the layout: a fresh serializer has `pool[USER_EDIT_SOURCE_INFO_ID]` carrying `Generated{by: user_edit, from: [], r: [0,0]}`. Rearranging reserved slots fails this test rather than silently shifting IDs. * New `test_user_edit_slot_id_matches_typescript_mirror` reads `ts-packages/preview-renderer/src/types/sourceInfo.ts` via `CARGO_MANIFEST_DIR`-relative path, parses the `export const USER_EDIT_SOURCE_INFO_ID = N;` literal, and asserts `N == 0`. Catches rename, restructure, or value drift on either side. Reader-side and TS-side tests that construct their own pool literals were left as-is — they're not asserting against the writer's reserved-slot contract. Snapshot regeneration: * 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated. Diff is exactly the plan's predicted shape: every `"s":N` reference shifts to `"s":N+1`, every `Concat` piece `source_info_id` shifts by +1, and each pool gains a new entry at index 0: `{"d":{"by":{"kind":"user-edit"}},"r":[0,0],"t":4}`. Example-fixture regeneration: * 20 `ts-packages/annotated-qmd/examples/*.json` + `test/fixtures/math-with-attr.json` rebuilt by running `cargo run --bin pampa -- -t json -i <each>.qmd`. Same +1 shift on every `s:` reference plus the new pool[0] entry. Required because the TS test suite reads these fixtures and indexes into the pool by the `s:` field. Verification: * `cargo nextest run --workspace` → 9731 pass (+4 vs Phase 5: the new reserved-slot and TS-parity tests, each running once as a unit test and once via the integration binary). * `cargo xtask verify` (full hub-build leg) → all 12 steps green including WASM rebuild + q2-preview-spa bundle. * annotated-qmd: 2/156 known failures remain (bd-1d6io, source-tracking off-by-one) — unchanged from Phase 5; not caused by this pool shift. Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md

…tructors Foundation for Plan 7f Phase 6 (test audit) and Phase 6.5 (production residue sweep). Adds, in `crates/quarto-source-map/src/source_info.rs`: - `By::test_scaffold()` — non-atomic, `kind: "test-scaffold"`. Paired with `SourceInfo::for_test()` for tests that need a `SourceInfo` field but have no real provenance to record. - `SourceInfo::for_test()` — convenience that returns `Generated{by: test_scaffold(), from: []}`. Replaces `SourceInfo::default()` in test code; intentionally produces *different* writer dispatch (R5/R3 synthesize vs R1-empty-range copy) because the new behavior is the correct one for AST without real source bytes. - `By::config_default()` / `By::programmatic_config()` — non-atomic sentinel kinds for `ConfigValue` residue sites (Phase 6.5 `config_value.rs` fixes lean on these). - `By::is_programmatic_sentinel()` — predicate matching `config-default | programmatic-config | unknown`. Replaces the pre-7f `source == &SourceInfo::default()` comparison in `navigation_href.rs`. Six new unit tests cover: constructor shape (kind/data) for each new `By::*`, non-atomicity for all four new kinds, `is_programmatic_sentinel` positive/negative cases, and `SourceInfo::for_test` shape. The existing `test_by_is_atomic_kind` was extended with three new negative assertions so a future change can't silently promote `test-scaffold`, `config-default`, or `programmatic-config` to atomic without breaking the test. No production callers yet — those land in subsequent commits per the Phase 6 / 6.5 work-item split in CURRENT.md.

…olding Plan 7f Phase 6 — first batch of the test audit. All sites in this commit are structural test scaffolding (constructors that require a SourceInfo field; no real source bytes exist for the hand-crafted fixture). Sites touched (test code only): - crates/quarto-xml/src/types.rs — 11 sites in `mod tests` (XmlAttribute / XmlElement constructor scaffolding). - crates/quarto-yaml-validation/src/tests.rs — 3 sites in `make_yaml_*` helpers. - crates/quarto-yaml-validation/src/validator.rs — 14 sites inside the file's `#[cfg(test)] mod tests` (yaml_scalar / yaml_array / yaml_object / test_navigate_nested fixtures). - crates/quarto-yaml-validation/src/schema/parsers/combinators.rs:66 — local `source_info()` test helper. - crates/quarto-yaml-validation/src/schema/helpers.rs:172 — same pattern, local `source_info()` test helper. - crates/quarto-ast-reconcile/src/generators.rs:631 — proptest generator for Shortcode. - crates/quarto-core/tests/integration/{jupyter_integration, navigation_e2e, navigation_merge, engine_merge, attribution_*}.rs — 35 sites across the 8 quarto-core integration tests that build hand-crafted Pandoc AST + ConfigValue fixtures. Behavior implications (per CURRENT.md's writer dispatch note): - `SourceInfo::default()` is `Original{FileId(0),0,0}` → `preimage_in(FileId(0))` returns `Some(0..0)` (empty range) → R1 copies zero bytes. `for_test()` is `Generated{by:test_scaffold, from:[]}` → `preimage_in` returns `None` → R5/R3 synthesize (or pass-through wrapper). The new behavior is the correct one for AST with no real source bytes, and no test in this batch asserts on writer byte output. - `navigation_href.rs:382` still uses `source == &SourceInfo::default()` (Phase 6.5 will swap this to `is_programmatic_sentinel()`). For the navigation_e2e / _merge / attribution tests in this commit, the swap is benign: `for_test()` no longer equals `default()`, but `resolve_byte_range()` returns `None` for the empty-from `Generated`, so navigation_href takes the "Concat/Filter" fall-through path and returns `raw` unchanged — same outcome as the old explicit short-circuit. Schema/merge.rs:32,51,88 and schema/mod.rs:256 (the 4 production SchemaError::InvalidStructure sites) intentionally not touched — they belong to Phase 6.5's `location: Option<SourceInfo>` refactor. Test results: per-crate `cargo nextest run` clean across all four crates (24/24 quarto-xml, 265/265 quarto-yaml-validation, 218/218 quarto-ast-reconcile, 2199/2199 quarto-core).

Plan 7f Phase 6 — pampa batch. All swapped sites are test scaffolding: pampa/tests/* (the 18 integration test files — 156 sites) and the `#[cfg(test)] mod tests` blocks inside pampa/src/* (85 sites). Plus crates/pampa/src/lua/filter_tests.rs (included via `#[cfg(test)] #[path = "filter_tests.rs"] mod` — the whole file is test code, 156 more sites). Test results: `cargo nextest run -p pampa` clean (3907/3907 pass, 2 skipped). No assertion-on-byte-output tests in this batch regressed under the R1-empty-range → R5/R3-synthesize dispatch shift that follows from for_test()'s non-Original shape. Production-residue audit (deferred): per `git grep 'SourceInfo::default()' crates/pampa/src/`, 42 sites remain in pampa src that are NOT inside `#[cfg(test)]`. Per-file breakdown: - `readers/json.rs` — 7 sites, all marked "Legitimate default: backward compat" for legacy Pandoc JSON without source info. Explicitly allowed by `provenance-contract.md` §10. Will need `#[allow(deprecated)]` annotations under Phase 7's `#![deny(deprecated)]`. - `lua/types.rs` (8), `lua/utils.rs` (10), `lua/readwrite.rs` (2) — Lua-side fallbacks where `filter_source_info` is expected to overwrite `SourceInfo::default()` with `Generated{by:filter,…}` before the AST is consumed. Producer contract acknowledges this pattern at the call-stack level. - `citeproc_filter.rs` (3), `pandoc/meta.rs` (3), `writers/json.rs` (2), `toc.rs` (2), `template/config_merge.rs` (5) — genuine production residue the Phase 6.5 plan did NOT enumerate. Most need a new `By::citeproc()`/`By::yaml_error_recovery()`/`By::toc_synth()` kind or routing through `By::programmatic_config()` / `By::unknown()`. Surfacing as a per-site decision before Phase 7 deprecation lands. This commit ships the 312 test-only swaps (test_scaffold writer dispatch is benign for tests that don't assert on byte output). Production sites tracked separately for Plan 7f Phase 6.5 extension.

Plan 7f Phase 6 — final test-audit batch. Covers all remaining crates with `SourceInfo::default()` test-scaffolding sites: 57 PURE_TEST files (where no production residue exists) bulk-swapped end-to-end, plus 28 MIXED files where the swap was scoped to the `#[cfg(test)] mod tests` region. Plus one `tests/integration/*.rs` file (`quarto-sass/.../brand_config_test.rs`) that's all test code by virtue of living under `tests/`. Affected crates: quarto-core (all transforms, stages, engine helpers, project plumbing), quarto-navigation (all subviews), quarto-pandoc-types/config_value.rs (95 test sites + 1 unused sentinel-equality test pinned to default() — see below), quarto-pandoc-types/inline.rs, quarto-config (all submodules), quarto-sass, quarto-doctemplate, quarto-yaml, quarto-publish, plus the integration brand_config_test. Two assertion-pin fixes after sed swept too eagerly: - `quarto-core/src/stage/stages/engine_execution.rs:1378` — `test_execution_context_has_source_info` asserts against the production `ExecutionContext::new` default. RHS reverted to `SourceInfo::default()` with a comment; Phase 7's deprecation will surface engine/context.rs:92 as a residue site and the assertion gets updated alongside. - `quarto-pandoc-types/src/inline.rs:1459` — `source_info_attr_empty` pins the `InlineAttr::new` fallback. RHS reverted to `default()`; this test is on Phase 6.5's deletion list (the InlineAttr::new signature refactor removes the fallback entirely). Production residue remains (not part of this commit, surfaced for Phase 6.5 + Phase 7): - Planned Phase 6.5 sites (enumerated in CURRENT.md): config_value.rs (5), project_resources.rs (2), navigation_href.rs (1+2 follow-up), citeproc/output.rs (1), config/materialize.rs (3), listing/feed/ stage.rs (2), yaml-validation/schema/merge.rs+mod.rs (4), pandoc-types/inline.rs (InlineAttr refactor + IntermediateAttr widening, ~10 sites). - Discovered residue not in plan: ~70 additional production sites across pampa (citeproc_filter, toc, pandoc/meta, writers/json, template/config_merge, lua/types, lua/utils, lua/readwrite), quarto-analysis, quarto-core engine/jupyter, quarto-core transforms (callout_resolve, categories_sidebar, shortcode_resolve, sidebar_auto, theorem, …), quarto-navigation. These will be surfaced by Phase 7's `#![deny(deprecated)]` once the deprecation attribute lands; fixes can be applied per-site or temporarily allow-listed at that time. - Legitimate `SourceInfo::default()` calls retained per the producer contract: 7 in `pampa/src/readers/json.rs` (Pandoc legacy-JSON backward compat, explicitly allowed by `provenance-contract.md` §10), 1 in `quarto-source-map/src/source_info.rs` (the actual `impl Default for SourceInfo` body — Phase 7 deprecates this). Workspace tests: 9736/9736 pass, 196 skipped.

…config_default / programmatic_config Plan 7f Phase 6.5 — first production-residue commit. Replaces four of the five `SourceInfo::default()` sites in `crates/quarto-pandoc-types/src/config_value.rs` with explicit `Generated{by:…}` provenance: - `impl Default for ConfigValue` (line 415) → `Generated{by: By::config_default()}`. The empty-Map sentinel used by every `ConfigValue::default()` caller. - `ConfigValue::from_path` (line 539) → `Generated{by: By::programmatic_config()}`. WASM-bridge programmatic injection. - `ConfigValue::insert_path` intermediate map + key_source (lines 822, 826) → same `programmatic_config` provenance. - Doc-comment example for `insert_path` updated to show the new shape. (Fifth `default()` site was on the assertion side of the now-fixed `source_info_attr_empty` test — that test still asserts against the production fallback in `InlineAttr::new`, which Phase 6.5's InlineAttr refactor removes.) Reader-side compatibility: `crates/pampa/src/readers/json.rs:2212` (top-level meta) updated to match. The JSON wire format does not carry a per-meta `s:` field (Pandoc-compatible), so the reader stamps the meta with the same `config_default` kind the writer's `ConfigValue::default()` now produces. Without this, every JSON round-trip would observably drop the meta's source_info; the `test_json_roundtrip_simple_paragraph` test caught it. The five other "Legitimate default" sites in the same function (2191/2195/2199/2315/2339 — backward-compat for legacy Pandoc-only JSON without `key_sources`) are deliberately left as `SourceInfo::default()` for now; Phase 7's deprecation will surface them as `#[allow(deprecated)]` candidates. Workspace tests: 9736/9736 pass, 196 skipped.

Plan 7f Phase 6.5 — second production-residue commit. Replaces the remaining enumerated sites in `quarto-core`: - `crates/quarto-core/src/project_resources.rs:123` — `Pattern::without_source` was using `SourceInfo::default()` as a scaffolding sentinel. Now `Generated{by: By::unknown()}`. - `crates/quarto-core/src/project_resources.rs:541` — Engine / Lua-filter resource entries don't carry a YAML source location; the call to `canonicalize_within_project` still requires a `SourceInfo` per the current signature. Replaced `&SourceInfo::default()` with `&SourceInfo::generated(By::unknown())`. Follow-up beads issue **bd-3az78** filed to refactor `canonicalize_within_project` to take `Option<&SourceInfo>`. - `crates/quarto-core/src/transforms/navigation_href.rs:382` — the programmatic-sentinel detector. Pre-Phase-6.5 code compared `source == &SourceInfo::default()`; that equality survives only as long as `Original{FileId(0),0,0}` is the canonical sentinel. Replaced with the producer-side predicate: `let SourceInfo::Generated { by, .. } = source && by.is_programmatic_sentinel()`. Matches the `config-default | programmatic-config | unknown` set introduced earlier in Phase 6.5. Doc-comment for the function updated to describe the new shape. Workspace tests: 9736/9736 pass, 196 skipped.

…tion → Option<SourceInfo> Plan 7f Phase 6.5 — eliminates the last residual `SourceInfo::default()` sites in quarto-yaml-validation. The variant's location field is now `Option<SourceInfo>`, distinguishing two semantically distinct cases: - **`Some(...)`** — error arose while validating user-supplied YAML against a schema. ~33 call sites in `schema/{helpers,parser,parsers/*}.rs` already pass a real `value.source_info.clone()` / `item.source_info.clone()` from the parsed YAML node; each wrapped in `Some(...)`. - **`None`** — error describes a bug in the schema *definition* itself (no user-YAML to point at). 4 sites: `schema/merge.rs:32, 51, 88` and `schema/mod.rs:250` previously passed `quarto_yaml::SourceInfo::default()` as a placeholder. Formatter (`error.rs:33-46`) now branches on `Option`: present → `"… (at offset N)"`, absent → no span suffix. Test pattern-matching at all destructure sites uses `{ message, .. }` so no test code needed updating. Added a regression test `test_schema_error_invalid_structure_display_no_location` for the new None branch. Compiler walked through 37 mismatched-types errors across 7 files and the `Some(...)`-wrap is mechanical at every call site (the right answer is what `rustc --explain E0308` literally suggests). Workspace tests: 9737/9737 pass, 196 skipped (one new test).

…t SourceInfo Plan 7f Phase 6.5 — eliminates the empty-AttrSourceInfo sentinel that was the last `SourceInfo::default()` site in `quarto-pandoc-types`. `InlineAttr::new` is now a three-argument constructor that requires the caller to supply a real `source_info`. A `new_from_attr_source` convenience preserves the "derive from non-empty AttrSourceInfo" path for the two test sites that legitimately want it. Producer-side: widened the `PandocNativeIntermediate::IntermediateAttr` enum variant from `(Attr, AttrSourceInfo)` to `(Attr, AttrSourceInfo, SourceInfo)`, paying the source_info acquisition once at the producer instead of three times at each consumer. All three production consumers (`treesitter.rs:558`, `treesitter_utils/caption.rs:35`, `treesitter_utils/paragraph.rs:27`) now destructure the third field and pass it straight through to `InlineAttr::new`. Producer constructors that emit `IntermediateAttr`: - `treesitter.rs:1166` (commonmark_specifier) — passes `node_source_info_with_context(node, context)`. - `treesitter.rs:1183` (unnumbered_specifier) — same. - `treesitter.rs:1202` (attribute_specifier empty fallback) — same. - `treesitter_utils/commonmark_attribute.rs:58` — gained a `span` parameter; callers supply it from their local tree-sitter node. - `treesitter_utils/info_string.rs:30` — re-uses the language-source range (no separate parent span available). - `treesitter_utils/language_specifier.rs:116` — uses `node_source_info_with_context(node, context)` over the language_specifier node. - `treesitter_utils/language_specifier.rs:161` — dead-code fallback in `process_nested_language_specifier`, updated for consistency. Eight consumer destructure sites updated to ignore the new third field with `, _` (atx_heading, code_span_helpers, editorial_marks, fenced_code_block ×2, fenced_div_block, span_link_helpers ×2). None of these production consumers currently uses the intermediate's source_info — they take their span from the parent tree-sitter node directly. Test-code call sites (`InlineAttr::new(empty_attr(), AttrSourceInfo::empty(), …)`) — six sites in `filters.rs`, `writers/plaintext.rs`, `lua/types.rs`, `lua/filter.rs` — pass `SourceInfo::for_test()` as the third argument. Two test sites in `inline.rs` that exercise the `AttrSourceInfo` → `source_info` derivation moved to the `new_from_attr_source` convenience method. Deletes the obsolete `source_info_attr_empty` test (the case it asserted — empty AttrSourceInfo + InlineAttr::new fallback to `SourceInfo::default()` — is now structurally impossible). Doc-comment for `AttrSourceInfo` at `attr.rs:44-46` updated: the old "fall back to `SourceInfo::default()`" recipe no longer matches reality (theorem.rs and proof.rs fall back to `None` already). Workspace tests: 9736/9736 pass, 196 skipped.

Phase 6 (test audit) and Phase 6.5 (production residue sweep — enumerated sites) are complete; full `cargo xtask verify` passes all 12 steps including the WASM build leg. Plan file checkboxes updated and a new "Discovered production residue" section catalogues the ~70 unplanned `SourceInfo::default()` sites the Phase 6 sweep surfaced. Per the plan's "-D deprecated strategy", these are deferred to Phase 7's compiler-driven audit.

Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds to the ~25 pampa production sites the original plan didn't enumerate. New `By::citeproc()` constructor (atomic, non-sentinel): citeproc- rendered content (citation Str replacements, bibliography `Div`s, `#refs` wrappers) generated by CSL processing. Atomic — the user edits citation styles via CSL, not through the preview's inline editing surface. Added to `is_atomic_kind()`'s match arm. Per-site fixes: - `pampa/src/template/config_merge.rs` (5 sites — lang default, pagetitle, top-level template-defaults map) → `By::config_default()`. Template defaults are the canonical "no value in user config, use this fallback" semantic. - `pampa/src/toc.rs:98, 190` (TocEntry → ConfigValue, NavigationToc → ConfigValue) → `By::programmatic_config()`. Programmatic derivation from in-memory TOC structures. - `pampa/src/citeproc_filter.rs` (3 sites — citation Str, bib entry Div, refs Div wrapper) → `By::citeproc()`. - `pampa/src/pandoc/meta.rs:95, 97, 231` (yaml-markdown-syntax-error recovery Span + yaml-tagged-string Span) → reuse the caller's `source_info` for both wrapper and inner Str so attribution points at the offending YAML range. The wrapper has the same bytes as the inner scalar — no new `By::` kind needed. - `pampa/src/writers/json.rs:604, 625` (yaml-tagged-string Span wrappers around Glob / Expr values) → same fix; reuse the value's `source_info` for both wrapper and inner Str. - `pampa/src/lua/types.rs` (8 sites — Lua-side inline construction helpers), `pampa/src/lua/utils.rs` (10 sites — Lua block_to_inlines LineBreak separators), `pampa/src/lua/readwrite.rs` (2 sites — Lua → ConfigValue conversion) → `By::unknown()`. These are Lua-side synthesis; the producer contract acknowledges that `filter_source_info` may overwrite with `Generated{by:filter,...}` on the way back out from a user filter. Snapshot regenerated: `crates/pampa/snapshots/json/yaml-tags.snap` (1 file). The diff is correct-behavior: yaml-tagged-string spans now share the YAML source range with their inner content (previously the wrapper had `Original{0,0,0}`), so the writer's pool intern coalesces three references that used to be three default entries. No semantic regression — fewer dead pool entries, identical inline-level source tracking. Remaining pampa residue (6 sites in `readers/json.rs`): all explicitly allowed by `provenance-contract.md` §10 (legacy Pandoc JSON backward-compat). They will need `#[allow(deprecated)]` when Phase 7's deprecation lands. Workspace tests: 9737/9737 pass, 196 skipped.

…quarto-config,quarto-citeproc): Phase 6.5 residue cleanup workspace-wide Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds across the remaining ~70 production sites the original plan didn't enumerate. Every non-test `SourceInfo::default()` in the workspace now has a deliberate provenance kind; the only retained sites are the 5 contract-allowed legacy-Pandoc-JSON sites in `crates/pampa/src/readers/json.rs`. New `By::*` kinds (3) added in `crates/quarto-source-map/src/source_info.rs`: - `By::jupyter_output()` — atomic. Synthesized blocks/inlines from kernel execution (Jupyter cell stdout / stderr, rich-display MIME bundles, error tracebacks). Regenerated on every re-run, so the preview's inline editor must not touch it. - `By::callout()` — non-atomic. Wraps callout-decoration synthesis (default-title injection, screen-reader-only type announcement); the user's actual callout body stays editable through the preview. Atomicity decision per the worked example in `claude-notes/designs/provenance-contract.md` §3. - `By::citeproc()` was added earlier in this phase and is reused here for `quarto-citeproc/src/output.rs:1274`. Per-site application: - `quarto-citeproc/src/output.rs:1274` (`empty_source_info()` helper) → `By::citeproc()`. - `quarto-core/src/engine/context.rs:92` (`ExecutionContext::new`) → `By::unknown()`. Matching assertion in `engine_execution.rs:1378` updated. - `quarto-core/src/engine/jupyter/{output.rs ×11, transform.rs ×1}` → `By::jupyter_output()`. Stream output, error tracebacks, MIME bundle conversion (text/plain, text/html, text/markdown, text/latex, image/* placeholders), and the inline-Code → Inline-Str expression-result swap in `transform.rs:279`. - `quarto-core/src/transforms/callout_resolve.rs` (3 sites) → `By::callout()`. Default-title Str, screen-reader-only Span wrapper, both child source_infos. - `quarto-core/src/transforms/shortcode_resolve.rs` — `config_value_to_inlines` (9 sites) + `lua_result_to_shortcode_result` (1 site) + `flatten_blocks_to_inlines` (1 inter-paragraph `Space` separator) reuse the surrounding `ConfigValue.source_info` or the shortcode token's source range so the canonical stamper pass downstream can wrap with the `Invocation` anchor. (The full enrichment chain — `Generated{by: shortcode, from: [Invocation]}` — happens at `stamp_block` / `stamp_inline`; this commit fixes the *innermost* synthesis sites.) - `quarto-core/src/transforms/sidebar_auto.rs` (4), `categories_sidebar.rs` (3), `sidebar_render.rs` (2), `sidebar_generate.rs` (1), `page_nav_render.rs` (1), `navbar_render.rs` (1), `footer_render.rs` (1), `toc_render.rs` (1), `listing_render.rs` (1), `navigation_enrich.rs` (1) → `By::programmatic_config()`. All synthesizing config-storage of rendered-HTML strings or navigation items. - `quarto-core/src/stage/stages/metadata_merge.rs` (4), `listing_item_info.rs` (2), `math_js.rs` (1) → `By::programmatic_config()`. Stage-processing intermediates where source bytes don't exist. - `quarto-core/src/project/listing/feed/{stage.rs, complete.rs}`, `listing/post_render_upgrade/substitute.rs` — five diagnostic builders → `By::unknown()`. Span-less diagnostics degrade gracefully through the existing `with_location` formatter. - `quarto-core/src/project/listing/config.rs:113` (`Listing::default().categories_source`) → `By::programmatic_config()`. Doc comment updated. - `quarto-config/src/materialize.rs` (3 sites: `key_source`, `MergedValue::Map` source_info fallback, missing-path `ConfigValue::null`) → `By::programmatic_config()` / `By::unknown()` per site. - `quarto-analysis/src/transforms/shortcode.rs` (7 sites) — reuse the shortcode token's source range; same pattern as the canonical `shortcode_resolve.rs` enrichment, in the simpler static-analysis form. - `quarto-navigation/src/{page_nav,navbar,sidebar,footer,item}.rs` (16 sites) → `By::programmatic_config()`. Navigation items synthesized without YAML source context. - `quarto-core/src/transforms/theorem.rs:312` doc-comment update (the actual fall-back recipe is `None`, not `SourceInfo::default()`, in the post-Phase-6.5 code). Doc-comment-only references in `shortcode_resolve.rs:172` and `navigation_href.rs:381` retained as historical references — they describe pre-Phase-6.5 behavior. Workspace tests: 9739/9739 pass, 196 skipped (3 new tests for the new `By::*` kinds in `quarto-source-map`).

Updates CURRENT.md to reflect that the discovered production residue (~70 unplanned sites) was addressed inline rather than deferred to Phase 7. Three new `By::*` kinds were defined during the sweep: `By::citeproc()`, `By::jupyter_output()`, `By::callout()`. After this commit, only 6 production `SourceInfo::default()` callers remain — 5 contract-allowed legacy-Pandoc-JSON sites in `pampa/src/readers/json.rs` and the `impl Default for SourceInfo` body itself. Full `cargo xtask verify` passes all 12 steps including WASM/SPA. Phase 7's compiler-driven audit now has a much smaller surface to cover — most of the heavy lifting moved into Phase 6.5.

gordonwoodhull closed this May 25, 2026

gordonwoodhull reopened this May 25, 2026

gordonwoodhull force-pushed the feature/provenance branch from 318ab48 to 4ee51e4 Compare May 25, 2026 03:10

cscheid mentioned this pull request May 28, 2026

pampa: migrate FileId scheme from sequential to hash-based (bd-ky14a) #235

Open

4 tasks

gordonwoodhull force-pushed the feature/provenance branch from 760dacd to c9dcc6f Compare June 1, 2026 14:11

gordonwoodhull added 21 commits June 1, 2026 16:07

gordonwoodhull and others added 29 commits June 1, 2026 16:07

docs(hub-client): changelog for d336daa (Plan 7f Phase 2 — q2-debug f…

9d2ad2b

…igure s: preservation)

gordonwoodhull force-pushed the feature/provenance branch from c9dcc6f to 88ef2ad Compare June 1, 2026 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: improve provenance and make q2-preview editable#231

feature: improve provenance and make q2-preview editable#231
gordonwoodhull wants to merge 104 commits into
mainfrom
feature/provenance

gordonwoodhull commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gordonwoodhull commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gordonwoodhull commented May 22, 2026 •

edited

Loading